DOCOHEHT PESOME 



ED 110 490 

AUTHOR 
TITLE 



POB DATE 
NOTE 



EDES PRICE 
DESCRIPTORS 



TH\004 763 



Brandenburg^r Dale C. 
The General Concept of Validity Appl 
Eatings: Or, Please, General Custer, 
Doing Here? 
[Apr753 

6p.; Paper presented at th,^ Annual Meetm 
American Educational Research Association 
(Washington, D,C,, Harch 30-April 3, 1975) 




ed \to Student 
Mhat Are We 



of the 



MF-$0,76 HC-$1.58 PLOS POSTAGE 
Effective Teaching; Higher Education; 
Improvement; Student Attitudes; Stude 
♦Teacher Evaluation; ^Teacher Rating; 
Validity 



Instrxictional ^ 
nt Evaluation; 
♦Test l^alidity; 

^ \ 



ABST5ACT 

The validity' of student ratings of in 
instruction is discussed. The use of student ratings 
increasing, especially in the areas of rank^ pay, and 
decisions. Although the author feels that stud^snt inp 
source of information, the type of data that is being 
based solely on empirical data. The results of this d 
affected by expected grades, rank of instructor, peer 
required courses as opposed to electives. What is bei 
this type of data must be assessed so that we have a 
evaluation of teachers and/or students. (DEP) 



structors\ and 
of teachets is 

retention\ 
ut is a valid 

collected ^is 
ata may alsb be 

ratings, or\^ 
ng measured by 
more complete 



^^♦iit*««******* **************************** *********** 

* Documents acquired by SBIC include many informal unpublished * 

* materials not available from other sources. ERIC makes every effort * 

* to obtain the best copy available, nevertheless, items of marginal * 

* reproducibility are often encountered and this affects the quality * 

* of the microfiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document Reproduction Service (EDRS) . EDRS is not ^ * 

* responsible for the quality of the original document. Reproductions * 

* supplded by EDES are the best that can be made from the original.. * 

************ *********************************************** 



ERIC 



9 



TBE GENERAL COHCEFT OF VALIDITY 
APPLIED TO STUDENT RATINGS 



Ut OErASTMENT OF HEALTH, 
EDUCATION 4 MELFARE 
NATIONAL INiTITUTt OF 
EDUCATION 

tH>s OOCuwEnt ;^a$ seen rEprO 
OUCEO EXACUY AS RECEIVED tSOM 
THE PERSON OR ORGANIZATION ORlCiN 
At.N&iI PO. NTS Of VIEW OR OPINIONS 
■-'\TED CO SO' NECESSA'J.i.Y REPRE 

fCNTO=* CiAL NATIONAL INSTITUTEOP 
OOCAT ON POSlT'ON OR POLlC 



or 



CD 



tu 



CO 

© 
© 



PLEASE, GENERAL CUSTER,. 
WHAT ARE WE DOING HERE?^ 

Dale C. ^Brandenburg 

Univeirslty of Illlnolai at Urbana-Ghampalgo 



Sooner or later %ihen one examines the validity of a given measuring, 
instrument, the question of "valid for what purpose?" arises* Validity 
of student ratings of Instructors and Instruction Is no exception. As 
specified In the new Issue of Standards fop Educational and Peydhologioal 
Te$tB (American Psychological Association, 1974): "Questions of validity 
are questions of what may be properly Inferred from a test score; valid- 
ity refers to the appropriateness of inference from test scores or other 
forms of assessment" (p. 25), and later, "It Is important to note that 
Validity is Itself Inferred, not measured." 

\ 

To get to ny point rather directly, can we infer from our present 
knowledge and technology based on student ratings that one instructor is 
a moi^ effective teacher in a given subject than another, as we can infer 
from our present knowledge and . technology based on achievement tests that 
one examinee possesses more knowledge of a given tonic than another exam- 
inee? I think not; and yet the use of student ratings for rank, pay, and 
retention decisions is becoming more frequent. This administrative use 
of student ratings is undoubtedly the weakest area for the validity of 
student ratings from a technical viewpoint. 

Before I return to this point, I x;ant to talk about two other general 
purposes that involve the use of student ratings: Improvement of instruc* 
tion and Information to guide the student in selection of courses. The 
latter purpose of student ratings is probably the area less susceptible to 
validity concerns than the former. Published student ratings undoubtedly 
provide students with the most solid and reliable piece of information they 
have for the selection of instructors and courses, given obviously the ^ 
choice of ''Such. 

However, when one considers the other two purposes of student ratings, 
lii|>rovement of instruction and personnel decisions, we must admit that 
validity data are weak, contradictory, and consequently they do not lead us 



"4^art of a symposium, ConceptuaHzaHons of validity for student 
ratings of insJ^ruotion, presented at the AERA Annual Meeting, April 2^ 1975# 



ERLC 



2. 



\ 

to dl«cover any coovlnclng;\deflnlti<m for an effective teacher. In die 
■eantlne,' we advocate to nev users developing faculty evaluation syatetgs 
that student Input Is a necessary* valid source of Information. 

I agree that student data are valid Input,* but let's look at the type 
of data we are collecting. Moat items on student rating qiiestlonnalres are 
concerned with process-type variables. Secent approaches by Hoyt (1969), 
Bogan and Hartley (1972), ^d jaeger and^^eljo U974) have begun to focus 
on outcones (tiure appropriately, student-viewed outcooes). Wilch type of 
variable Is nore lsq>ortant for a given purpose: Process or outcone? Wte 
can argue inore about this later; but for the nooent, let's consider what 
we are doing. Host questionnaires, to ay 'knowledge, have been fonulated 
f roB enplrlcal bases ; other Instnneiyts vaeg have begun by asking certain 
groups of Individuals for their definition of an effective teacher. I sug- 
gest thit this latter ^proach is for the most part also enplrlcal, i.e. , 
based yon past experience and perception. 

How Many student rating InstnaaeiUs da you know that have been based 
oay^fwething called a theory of instruction? i don't know of any. It se* 
t4wt that we have relied upon sonethlng based entirely on enplriclsa, «d' ' 
we have proceeded for a long tine to draw Inferences. from our enplrlcal data 
base, the consequences have not, thus far, been chaotlci but I don t 
we know where we're heading, it reminds me of what Suppes (1974, p. 6) said 
here last year: "Reliance on Bare enplricism.or bare Intuition in* educational 
practice is a mental form of streaking, and nudity of mind is not as appealing 
as nudity of body." . . 

To Illustrate vej point, a recent careful, comprehensive review of the 
teacher evaluation literature since 1929, (Batista, ChavaoteHatloe of of fee 
tive colUge tr.aahere^ unpublished manuscript, DeceiAer, 1974) found that 
the followinp, four general types of items appear In almost every student 
rating form; 

1. Subject matter: Knowledge of subject natter and endiuslasra for It. 

2. Classroom performance: Organliation of lectures, ablUty to coiwu- 
nlcate a clearness of presentation. ' ~ 

3. Relations with students: Awareness of student needs and Interests, 
acceptance of students' points of view ahd^acllltatlon ot student participat 

4. Student learning: Stimulation of students' thinking, capability of 
prowtlng students' Interest in the field and facllllatlon of students personal 
and professional growth. 

It is not difficult to discern that what we have above can generally be thought 
of as process variable. It was Hogan ana Hartley's paper (1972) that got us 
thinking more about outcomes. What had we been doing previously— the CIOO 
principle most have been working well In our factor analytic studies of 
student ratings (garbage in, garbage out). Shy is it that this factor was 
undiscovered, or could we have been relying too heavily on eaplricism? 




ERIC 



3. 



Still I can agree that our present best single piece of Infomation in 
the evaluation t>£ instructor performance, i.e., the teaching coiQponent» is 
student ratings of performance. We mist realize, however, that students^ 
ratings are an indirect evaluation of teaching. Direct evaluation (or ptlmary 
evaluation as Serf ven (unpublished manuscript, 197^4] defines it) is "determlns' 
tion of ,the gains in understanding or learning or attitudes resulting from a 
particular exposure to teaching, possibly conbined vith some direct checks 
on the justice and pleasure of the classroom process" (p.. 9). About Indirect 
or sacondc^ evaluation, Scriven states f "One needs to face the fact that 
alaoflt all secondary evaluation (of teaming) is extrensely unreliable [defined 
as the ability to Identify a superior teadier] and that a great deal of It^ 
is npt only co]q>letely worthless, but so unreliable that its use does more 
harm than good" (p. 9X. Scriven also complains that at present there exists 
no "defensible global "(synthesized) rating of teaching merit*' (p« 10) « To 
complete tie argun^nt on secondary evaluation, Scriven states: "The basic 
axiom here is, of course, that it is logically imioosslble for any indications 
(and hence any secondary, evaluation) to be validated unless some J[>rlnary 
evaluation is done" (p. '10). We know that there exists some studies Which 
have siiown fthat student ratings are related to student learning or adiieve* 
ment^ but here again the results are cc«itradictory« If ve forget about 
Rodin and Rodin's study, and .1 am inclined to do that, we are still left 
hanglng^a bit with the present studies. The reason for this is that none 
of there studies have utilized the normal collepe classroom as the unit of 
analysif^-'I ajrgue for this because I do not consider teaching assistants 
ahd tfteitr classes to be reflective of the wlnstream of college instruction* 

In the total, scheme for the evaluation of teaching, there is a place 
for student rating. Even Scriven (1974, p. 11) agrees that, student ratings 
should be given some weight .because "there is a commonsenslcal and moral 
basis for expecting them to be at least weakly correlated to success as a 
teacher." In order to move forward on research in this area (to borrow 
Scriven* s terms) we oust didtingUish between indicator variables (factors 
that are correlated with good teaching) and improvement variables (factors 
that can be eimilated and that control teaching merit)* In the review of 
literature cited previously (Batista, 1974) he formulates a definition of 
an effective teacher from a composite of student rating results* It is an 
Instructor i*o is "cultural and/or knowledgeable, enthusiastic about his 
subject area, emotionally stable, concerned for the welfare of hia students* 
able to coommlcate his ideSs clearly, and capable of stiimslatine students' 
learning and growth" (p. 64). If we believe that our clients, the college 
facultyL-At_large^ sho uld tr y_to emulate the current profiles of student 
rating results of our best teachers we may~^r^*at best fruitless and possibly 
counter productive" (Scriven, p. 22) In trying to improve instruction* This, 
then, is the primary reason why we should not include process items on 
questionnaires whose results are going to be fed into personnel decisions* 

Let us consider the following questions: 

1. Are student ratings related to cofleague^ratln^ — 

2. Do students in required courses rate differently than students in 
elective courses? 



/ 

/ 



4* 



V \ 



3. When is the proper time to admi^ster ratings? ^ 

4. . Do present student ratings cortielate with alumni ratings? 

5. Do student ratings on one form correlate with those from another? 
ei Do ratings differ depending on the rank of the instructor? 

7. T)o ratings differ among subject matter areas? 

8. Are studrat ratings related to cognitive achievement? 

9. How much does expected grade influence the ratings? 

While I do not deny the importance of soir^.of these questions, they are con- 
cerned with one exception (#8) with indirect evaluation V*^ «one have ta do 
with iaproveoent variables. Yet^ we advocate the use of student ratings for 
improvement of instruction and administrative decisions. j.Are we imprjoving 
instruction or are we really lnq^roving the ratings? Are We promoting better 
instructors or Instructors with better ratings? What st^ard do we have to 
go by> i.e., can we tell an instructor that if he does thjts he will be a 
better Instructor and to hell with what the ratings say? \ In general » 1*11 
have to agree with Menne (iVCME MeoBurement Neu>B^ Aprils 1974), who stated 
that what we. are measuring is teacher performance and not 1 teacher effective- 
ness. It appears that Dr. Fox is a grand example of where we should not 
be going. ; 

To return to the introduction,- are our inferences frota these measures 
appropriate, i.e., valid, or are we perpetrating clrctilar empiricisms? I 
believe that there is some room for debate. 



ERIC 



References 

American Psychological Association, StandaxniB for educatioml 'and psyahologioal 
IfBSta* Uashington, D.C«> 1974. 

Batl8ta» £• E. Characterietics of effective college teachers. Unpublished 
nanxiscrlpt, Iteasurement and Research Division, Office of Instructional 
, Resources, University of Illinois at Urbana-Chacqiaign, 1974. 

Hogan, /T. & Hartley, E. Some additional factors In student evaluations of 
ddurses. American Educational Beaearch Journal^ 1972, 9, 241-250. 

A 

HoytrD\P. Student reactions to instruction and courses. Unpublished 

manuscript, Office of Educational Resources, Kansas State University, 1969. 

Jaeger, R.V. & Fcljo, T. D. Some psychometric questions In the evaluation^ 
of profassors. Journal of Educatioml PsypJ^logy, 1974, 66, 416-423. 

Msnne, J* H. Veacher evaluation: Performance or effectiveness? NCMB 
Measurement Hews, April. 1574, 11-12. 

x|crlven, M. The\valuation of teaching. Unpublished manuscript. University 
of Calif omla\at Berkeley, 1974. 

Suppes, P. The place of theory in educational research. Presidential 

address to the American Educational Research Assocl|itlon, April 17, 1974. 

Educational Researcher, 1974 ^ 3^ 3-10. 



